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Abstract 



CN I We show that a general purpose clusterization algorithm, Deterministic Annealing, 

^ ■ can be adapted to the problem of jet identification in particle production by high 

, energy collisions. In particular we consider the problem of jet searching in events 

I generated at hadronic colliders. Deterministic Annealing is able to reproduce the 

• results obtained by traditional jet algorithms and to exhibit a higher degree of 

C^i flexibility. 

g^' PACS Numbers: 13.87.-a, 13.38.Be, 05.10.-a, 45.10.Db 



1 Introduction 



In high energy hadron-hadron collisions, events with high transverse energy 
are characterized by highly collimated particle jets, reflecting hard scatter- 
ing processes at parton level. Radiation and pair production processes hide 
the information on the original partons momenta. To bridge the gulf between 
experimental results expressed in terms of hadron properties, and the the- 
ory, whose ingredients are quarks and gluons, a reconstruction processes is 
needed. By this process hadrons in the final states are grouped in jets and 
many dedicated algorithms have been proposed to this purpose. These algo- 
rithms, that shall be reviewed in section 2, appear to be reasonable recipes 
taking into account geometrical considerations and theoretical prescriptions. 
It can be guessed that in this way one is solving an optimization problem, 
trying to minimize some cost functions. This is exactly at the basis of the so- 
called clustering problem. Here one looks for the optimal partition of a given 
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set of objects in classes on the ground of some similarity property. This task 
is performed minimizing a prescribed cost function that is to be adapted to 
the problem under investigation. In a recent paper [1] it has been shown that 
a particular clustering algorithm, the so-called Deterministic Anneahng (DA) 
[2,3,4], can be adapted to the study of the hadronic jets in high energy e'^e" 
scattering. Essentially, DA can give the same results of the standard Durham 
algorithm in a faster way, as a consequence of a lower computational complex- 
ity. In this work we try to extend the use of DA to hadron-hadron collisions 
taking into account the peculiarities of the jet production in this type of in- 
teraction. In particular in this kind of interactions only a part of the particles 
in the final state can be associated to partons coming from a hard scattering 
process. 

Deterministic Annealing, in a version that allows data analysis in terms of a 
number of clusters either fixed or variable, will be presented in section 3. In 
section 4 results from the application of this method to simulated events will 
be presented and compared with those obtained by a Cone algorithm. Section 
5 is dedicated to comments and conclusions. 



2 Jet clustering algorithms 



The need to associate energy and momentum of particles in the final state to 
the four-momentum of unobservable partons is realized through jet clustering 
algorithms ^ . The most common of them can be classified in two categories: 

• Association algorithms that use an iterative procedure. For every pair of 
particles with four momentum pi and pj, a test variable Uij = f{pi,Pj) is 
calculated. This test variable is then compared to a given threshold pa- 
rameter Ucut and the pair is recombined into a new pseudo-particle k of 
four-momentum pk = Pi + Pj (E scheme, but other schemes have also been 
considered) provided that yij < ycut- The algorithm is then reiterated to the 
new set of (pseudo) particles and it stops when, for all pairs, > i/cut- The 
number of pseudoparticles at the end of the algorithm counts the number 
of jets, which is therefore fixed by ycut- The ancestor of these jet algorithms 
is the JADE algorithm [7,8] where the jet resolution variable is defined as 

Uij — Uij — rp2 ' 

vis 

where Ey^g is the visible energy, i.e. the sum of energies for all particles 
observed in the final state, Ej are the particles energies, and 9ij their 

^ For a review of these and other jet algorithms see [5]. For a review of the Monte- 
carlo generators and their connections with the jet algorithms see [6]. 
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angular separation. The theoretical advantage of this recombination scheme 
lies in the absence of collincar and infrared singularities, as the regions of 
phase space where these divergences could be generated are automatically 
excluded. However it is clear that also particles at very different angles can 
be recombined in one pseudo-particle, and this fact can give rise to the 
appearance of ghost jets along directions where no particles are present. 
This problem suggested to modify the test variable in the following way 
(Durham algorithm [9,10,11]) 

_ ^_ 2min{Ef,E]}{l-cos9,j) 

vis 

Successively yet another variable has been introduced = 2(1 — cos^j^). 
Firstly the pairs of particles are ordered following this variable, then the 
precedent scheme is applied. If the recombination fails [uij > Ucuti the 
softest (pscudo-)particle is freezed and hindered from being an attractor for 
other particles. This mechanism avoids soft collinear particles to be the seed 
for unwanted jets. The algorithm that implements these new rules is known 
as Cambridge algorithm [12]. 
• To a second class belong algorithms that associate particles in a jet only on 
the ground of geometrical properties. The prototype for them is the Cone 
algorithm defined in the Snowmass Convention [13]. Here in the first step 
the few particles having a transverse energy Ex greater than a fixed thresh- 
old E^ are selected as seeds for jets. Subsequently the particles lying in a 
cone of given radius i?o in the pseudorapidity-azimuth plane around each 
seed are associated with a jet, whose direction is fixed by an iterative pro- 
cedure. More refined approaches consider the possibility of recombination 
and splitting of these proto-jets. 

Here we stress an important difference between these two categories. While for 
the algorithms of the first kind jets include all the particles and their number 
can be fixed a priori, for the algorithms of the second kind the number of jets 
is essentially determined by the number of particles used as seed and a varying 
part of particles is excluded from the classification. This is the reason why the 
former scheme is used in the case of electron-positron scattering and the latter 
in the case of hadronic diffusions, where not all the particles are produced in 
hard interactions. 



3 Deterministic Annealing 

As we said the clustering problem consists of the optimal grouping of a set 
of data points so that points in the same class are more similar than points 
in different classes. Deterministic Annealing is inspired by an analogy to the 
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annealing procedure that consists of maintaining a system at thermal equi- 
librium while gradually lowering the temperature. The process assures that, 
in the limit of low temperature, the global free energy minimum is attained. 
The word deterministic refers to the fact that, as we shall see, thermal equi- 
librium is obtained minimizing directly the free energy, in opposition to the 
stochastic simulation used by Simulated Annealing [14]. We introduce here 
a formulation of DA called Mass-Constrained Chistcring (MCC) [3,4] that is 
particularly suitable for our application. In effect in this formulation the num- 
ber of clusters is not fixed a priori, as it happened in the precedent application 
of DA to the jet searching problem [1] , but is the result of the calculation. 

Let us consider two sets, the set of the data points a; G X we want to classify 
and the set of the vectors representative of the clusters y eY, also called code- 
vectors. The MCC approach introduces an infinite number of code-vectors; at 
each stage of the annealing process only a limited portion of them are distinct, 
so one introduces a quantity Pi denoting the fraction of code-vectors which 
are coincident and represent the same cluster i. One defines also the local 
distortion d{x, yi) between each data point x and each effective code-vector 
yi. The global distortion D is defined as 



where p{x, y) is the joint probability distribution, p{x) is the probability of 
each data set element and p{yi\x) is the conditional probability relating the 
element x with code-vector j/j, i. e. the probability to associate x with clus- 
ter i. Following the analogy with a statistical physics system, D plays the 
role of the internal energy which, in the limit of zero temperature, one wants 
to minimize. In this limit one obtains the hard clustering solution, in which 
the association probabilities are zero or one. At finite temperature the mini- 
mum of the Helmholtz free energy F determines the distribution at thermal 
equilibrium. This minimum is given by : 



D = ^^p{x,yi)d{x,yi) = ^p{x)Y,P{yi\x)d{x,yi) , 



(3) 



X i X 



F* — — 



'X -I 



(4) 



X 



where is the partition function for the single data point 



Zx = ^Pi(i 



d(x,yi)/T 



(5) 



As a consequence the conditional probabilities are given by the Gibbs distri- 
bution 



Imposing the free energy minimization nndcr the constraint J2iPi = 1) one 
obtains that the optimal set of code-vectors {yi} must satisfy the equations 



Y.p{x)p{yi\x)Vyid{x,yi) = 0, 



(7) 



X 



while 



Pi = J2p{x)p{yi\x) = p{yi) . 



(8) 



X 



Prom eq. (7) one obtains that the positions of the code- vectors are determined, 
for a squared error distortion d{x, y^) — \x — yi\^, by 



The annealing process starts at high temperature. From (6) it is clear that the 
association probabilities are uniform, the system is completely disordered and 
the code-vector set collapses to a single point. This unique code-vector has 
p{yi) — 1, every point is associated with this code- vector with probability 1, 
p{yi\x) = 1, and equation (9) gives the position of the centroid of the data set 
yi = J2xP{^)^- During the cooling process one encounters phase transitions 
which consist of an increase in the number of code- vectors through a sequence 
of cluster splittings. The temperature plays the role of the resolution parameter 
at which the data set is clustered and a complete hierarchical clustering can 
be obtained up to the extreme situation at zero temperature when there is a 
code- vector for each point of the data set. This process is described in Fig.(l) 
where the behavior of the Free energy F as a function of /3 = 1/T is shown for a 
typical event among those analyzed in the next section. Prom a practical point 
of view. Mass Constrained Clustering can be implemented by an algorithm 
that here wc briefly sketch. Starting from a low value of f3 one introduces 
two clusters with coordinates slightly perturbed with respect to the centroid 
coordinates and equal probability for every point to be associated with each 
cluster. Then one minimizes the free energy iterating the equations: 



Vi = 



J2xxp{x)p{yi\x) 
PiVi) 



(9) 



(10) 



X 



pivtW) 



(11) 



J2x xp{x)p{yi\x) 

PiVi) 



(12) 
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Fig. 1. The phase diagram for a simulated p-p scattering event (see sect. 4) 



until one finds convergence in y-i. If /? is low enough, it comes out that these 
two clusters are coincident. The next step is to cool the system, j3 aj3, 
always iterating equations (10,11,12) until a solution corresponding to two 
different code- vectors (the first phase transition) is encountered. Subsequently 
one goes on by introducing, for each code- vector location, two perturbed code- 
vectors which share the association probability of each data point, raising f3 
and determining the new code-vectors coordinates. Each pair of code-vectors 
will be merged until a critical /3 is reached, in which case one of the pairs 
will originate two effective code-vectors. The process will be stopped when a 
sufficient resolution {(3 value or number of clusters) is reached. 

In order to apply apply this algorithm to the problem of jet search in hadronic 
collisions we must first choose a distortion measure. We considered the squared 
error distortion in the pseudorapidity-azimuth plane 

d{x, Ui) = {r]a, - r]if + (0^ - 0^)^ . (13) 



The other ingredient is the weight p{x) to assign to each particle. As our 
purpose was to make a comparison with the Cone algorithm, we assigned to 
a particle x with transverse energy Et the weight 



This assignment, together with (13), has the interesting property that the 
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coordinates of a jet, as defined by the Cone algorithm, 

^' = ^E^T^' </'' = ^E^T0' E^=Y.E^, (15) 

are exactly recovered by the DA algorithm in the limit of hard clustering 
(/3 — > oo). In this limit, indeed, the association probabilities of each data 
point (particle) to a cluster (jet) in eq. (11) become or 1 and from eq. (12) 
one obtains exactly eqs. (15). 



4 Results and discussions 

We are now in position to explore the possibility of applying the Mass Con- 
strained Clustering version of Deterministic Anneahng to the problem of jet 
search in hadronic coUiders. To this purpose we generated 2000 events from 
proton-proton scattering at 14 TeV by the PYTHIA [15,16] Monte Carlo 
generator; a bias in the transverse energy Et of the initial partons was in- 
troduced, corresponding to Et = 100 GeV for 1000 events (sample A) and 
Et = 200 GeV for the other 1000 events (sample B); initial and final state 
radiation was allowed. With this bias, a clear back-to-back two jet structure 
is expected. Results from application of DA where systematically compared 
with those obtained by the Cone algorithm described in section 2: the Cone 
algorithm parameters, the transverse energy threshold E^ and the cone radius 
Rq have been fixed to 2 GeV and 0.7, respectively. 

We calculated first two quantities that can be easily used for a comparison with 
the Cone algorithm. The first quantity is the mean distance of each particle 
from a code vector j, defined as 

1 1 / 

where Nc is the number of clusters found, (d) is a decreasing function of P 
attaining its maximum value at f3 = 0, when there is only one cluster, and 
its minimum value, that is zero, at = oo when every particle is a cluster by 
itself. This quantity, averaged over all the events, is shown, as a function of /? 
in the left part of fig. 2. We can see that there is no practical difference between 
the two analyzed samples: in either case (d) decreases quickly for low values of 
/3, due to the growth in the number of clusters, then the descent becomes very 
slow. This behavior is the signal that the particle distribution in the events 
we are analyzing is such that the initial partition in few clusters is preserved 
when /3 is increased, apart from fragments of low weight. This robustness is 
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Fig. 2. First results from the application of DA algorithm to simulated events. Left 
side: (d), averaged over all the events, vs /?. Right side: the mean clusters number 
{Nc),ys p. 

confirmed by the second quantity we calculated, that is the mean number 
of clusters (A^c), whose behavior, as a function of f3, is shown in the right 
part of fig. 2. Wc sec that the region of extreme fragmentation, that ends the 
clustering process, is far away at /3 = 4. 

How to determine the two jet nature of our events? To answer this question we 
note that the DA recipe cannot be yet considered complete, because we still 
have two problems. The first problem is that the annealing process must be 
stopped at some P value to avoid the extreme cluster fragmentation produced 
by the /3 — > oo limit. The second problem arises because only part of the clus- 
ters can be attributed to the scattered partons. Therefore, once we choose f3, 
we need also a criterion to select real jets from clusters. In the Cone algorithm 
these questions are controlled, as mentioned before, through two parameters: 
the transverse energy threshold Ej, and the cone radius Rq. We remember that 
DA introduces a probabihty measure for the clusters, the expression (10). A 
peculiarity of these probabilities is that the two jet nature of the events here 
analyzed produces, in the f3 region where (d) has a smooth behavior (/3 > 1), 
two clusters of high probability, while to the remaining clusters only a small 
fraction of unity is assigned. To illustrate this feature the probability distribu- 
tions for the five most probable clusters at /3 = 1.4 are shown in fig. 3 for the 
events from sample A. A small value of cluster probability reflects the fact that 
the particles assigned to this cluster with a good association probability are 
few and have a small weight (transverse energy). So it is natural to consider 
jets only the clusters that survive a cut in the probability value. For example, 
we see in fig. 4 how a threshold at po — 0.15 influences the /3 dependence of 
the mean number of clusters A^c- Now this quantity goes rapidly to a value 
close to 2, i.e. the expected value for our sample. 

At this point we are ready to illustrate how DA is able to reproduce the 
results obtained by the Cone algorithm. We performed the annealing process 
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Fig. 3. The probability distributions for the five most probable clusters at /3 = 1.4. 
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Fig. 4. The mean clusters number (Nc), vs /3. Only clusters having probability 
greater than po = 0.15 have been considered. 

up to a /9 value of 1.4 and accepted only clusters with probability greater than 
Po = 0.025. With these values we obtained (d) — 0.69±0.11, close to the value 
of Rq — 0.7 used for the Cone algorithm. This could be expected because j3 
has an effect on the association probability of a particle to a cluster (see (11)) 
that is comparable to that of the parameter Rq for the Cone algorithm, if one 
puts (3 ~ 1/2-Ro- No fine tuning of these parameters was performed, because 
this is not the aim of this article. 

The comparison between the two algorithms is reported in fig. 5 for two observ- 
ables: the number of clusters and their transverse energy distribution. Some 
differences can be noted, in particular there is a more pronounced tail in the 
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Et distribution for the DA algorithm. This can be easily explained by the fact 
that, assuming (d) ~ Rq, Rq is a sharp threshold for the Cone algorithm, while 
for the DA algorithm {d) is the mean cluster radius. Another minor discrep- 





Fig. 5. The clusters number distribution (top) and the clusters transverse energy 
distribution (bottom) for the DA algorithm (solid line) and the Cone algorithm 
(dashed line). (Events are from sample A.) 

ancy is in the A^c-distribution that, for the DA algorithm, is slightly shifted to 
higher values of Nc- We could get rid of these differences modifying the values 
of /9 and po, but, as we said, we found this job useless, not least because we 
used a very simple Cone algorithm, where, for example, no recombination or 
splitting mechanism for proto-jets have been considered. A more interesting 
question to ask is which algorithm better reproduces the properties of the par- 
tons originating the jets. To this purpose we introduced two variables for each 
parton participating in the hard initial scattering and for the cluster nearest 
to it in direction. The first variable describes the ability to identify the parton 
direction: 



where a is the angular separation between parton and jet axis. The other 
quantity measures the ability to trace the transverse energy of the parton: 

A = ^^'^ ~ ^^'^ (18) 
Et,p 



where Et^p and E-r^c ai'e the transverse energies of the parton and the jet 
respectively. Their distributions for the two algorithms and the two data sam- 
ples are shown in fig. 6. We can see that, while for the S distribution there 
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Fig. 6. The distribution of the angular distance between partons and jets (left) and 
the clusters transverse energy distribution (right) for the DA algorithm (solid line) 
and the Cone algorithm (dashed line). 

are no practical differences between the two algorithms, DA seems to be more 
efficient in recovering the hard parton transverse energy. 



5 Conclusions 



We have compared the results found by the Cone algorithm with those ob- 
tained by a clustering algorithm based on the Deterministic Annealing proce- 
dure. The latter has been adapted to the process studied in this article, i.e. 
jet identification in particle production by high energy hadronic collisions, by 
introducing a suitable distortion measure and using temperature and cluster 
probability as parameters. Other choices are possible. For example one could 
take into account that the phase transition producing the splitting of a cluster 
occurs at a temperature proportional to the variance of the cluster itself [4]. So 
a characteristic of well defined clusters is that they are stable for a wide range 
of temperature and this stability property could be used in jet recognition. 

From this preliminary analysis we cannot conclude that the DA algorithm 
should be preferred to the Cone algorithm, even if the good results for the 
number of clusters (4) and on the parton transverse energy should not be ne- 
glected. In any case we think that the jet-physics community should consider 
DA as a possible and serious alternative. The use of a geometrical definition of 
jet appear indeed too simplifying with respect to the theoretical descriptions. 
On the other hand, the DA algorithm looks at the properties of the density dis- 
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tribution in the momentum space, which is the reason why the recombination 
and sphtting mechanisms are automatically incorporated. 

Moreover, there is another general question that could be solved in this calcu- 
lation scheme. It is true that the choice of jet definition is a matter of conven- 
tion and that the important thing is to use the same definition in theoretical 
predictions and in experimental analysis. However it cannot be considered sat- 
isfactory that, while there is a unique theory explaining jet production and 
properties, different definitions are used in hadronic and in leptonic collisions. 
The purpose of this paper is to demonstrate that this difficulty could be over- 
come using the same algorithm, so that one can focus all the efforts in the 
most important question, i. e. the similarity property used to decide if two 
particles should be assigned to the same jet. Using a correct definition of this 
quantity, indeed, one can take into account important theoretical peculiari- 
ties, as infrared and collinear safety or formation of "ghost" and "junk" jets 
[17,18]. These kinds of similarity measure have been used, until now, only for 
— e~ collisions and embodied in algorithms with poor performance, since 
they have to loop on all the particles' pairs. We hope to have clarified (see 
also [1]) that they could be used for any kind of interaction, without giving 
up the reduced computational complexity that geometrical algorithms share 
with the method we propose. 
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